Practical Session 9: The relation between two time series

Expected learning outcomes

By the end of this session, participants should be able to:

  • assess and interpret associations with external variables

  • identify and interpret effect modification between external variables and the outcome variable in surveillance data

Task 9.1

Using the aragon dataset, assess the effect of ambient temperature (using average weekly maximum temperature) on mortality in the autonomous community of Aragón. Is this effect uniform throughout the year?

Help for Task 9.1

Open the dataset aragon.csv. There you will find mean maximum temperature and mortality data for the autonomous community of Aragon by week.

Show the code
aragon <- import(here("data", "aragon.csv")) 

Convert the data to a time series and plot both variables (weekly mean maximum temperature and mortality data).

Show the code
# Create ts dataframe with yearweek index
aragonz <- aragon %>% 
  mutate(
    year_week = make_yearweek(year = year, week = week),
    index = seq.int(from = 1, to = nrow(aragon))
  ) %>% 
  as_tsibble(index = index)

# Temperature plot   
aragonz_tmax_plot <- ggplot(data = aragonz) +
  geom_point(mapping = aes(x = year_week, y = tmax), colour = "blue", alpha = 0.5) +
  scale_y_continuous(limits = c(0, NA)) +
  scale_x_yearweek(date_labels = "%Y-%W", 
                   date_breaks = "1 year") +
  labs(x = "Week", 
       y = "Maximum Temperature") +
  tsa_theme

# Mortality plot
aragonz_cases_plot <- ggplot(data = aragonz) +
  geom_point(mapping = aes(x = year_week, y = cases), colour = "green", alpha = 0.5) +
  scale_y_continuous(limits = c(0, NA)) +
  scale_x_yearweek(date_labels = "%Y-%W", 
                   date_breaks = "1 year") +
  labs(x = "Week", 
       y = "Weekly case counts") +
  tsa_theme

# Using patchwork to join the two plots
# The "/" determines an above/below display
aragonz_two_plot <- (aragonz_tmax_plot / aragonz_cases_plot) 
aragonz_two_plot

Generate variables for sine and cosine for annual oscillation.

Show the code
aragonz <- aragonz %>% 
  mutate(sin52 = sin(2 * pi * date / 52),
         cos52 = cos(2 * pi * date / 52))

Fit a poisson regression model with a simple trend to the weekly number of deaths, accounting for seasonality.

Show the code
aragmodel1 <- glm(cases ~ index + sin52 + cos52,
                           family = "poisson",
                           data = aragonz)

summary(aragmodel1)

Call:
glm(formula = cases ~ index + sin52 + cos52, family = "poisson", 
    data = aragonz)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) 5.252e+00  6.312e-03 832.117  < 2e-16 ***
index       1.260e-04  2.082e-05   6.052 1.43e-09 ***
sin52       4.442e-02  4.426e-03  10.037  < 2e-16 ***
cos52       1.038e-01  4.417e-03  23.500  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1726.7  on 519  degrees of freedom
Residual deviance: 1046.5  on 516  degrees of freedom
AIC: 4756.3

Number of Fisher Scoring iterations: 4

Plot the predicted weekly number of deaths.

Show the code
ggplot(data = aragonz) +
  geom_point(mapping = aes(x = year_week, y = cases), alpha = 0.2) +
  geom_line(mapping = aes(x = year_week, y = aragmodel1$fitted.values), 
            colour = "darkorange", 
            alpha = 0.7, 
            lwd = 1.5) +
  scale_y_continuous(limits = c(0, NA)) +
  scale_x_yearweek(date_labels = "%Y-%W", 
                   date_breaks = "1 year") +
  labs(x = "Week", 
       y = "Weekly case counts", 
       title = "Cases with fitted trend") +
  tsa_theme

Discuss how your model fits the data.

Note: When you plotted the weekly number of deaths in Aragon, perhaps you noticed that mortality peaks in winter; however, there are smaller peaks during the summer, too. You can account for that in two different ways: a) include two sine/cosine functions in the model (one for 52-week cycles and one for 26-week cycles), or b) take into account the fact that temperature might be behaving differently as a risk factor in winter than it does in summer (sign of effect modification?).

You decide to perform your analysis twice; once for winter and once for the rest of the time. You define winter as the time between weeks 49 and 8.

Show the code
aragonz <- aragonz %>% 
    mutate(winter = (week >= 49 | week <= 8))

First run the model including winter as a main effect, and then including an interaction term with temperature.

Show the code
# Model with only main effect
aragmodel2 <- glm(cases ~ index + sin52 + cos52 + winter,
                           family = "poisson",
                           data = aragonz)

# Model with temperature interaction
aragmodel3 <- glm(cases ~ index + sin52 + cos52 + winter * tmax,
                           family = "poisson",
                           data = aragonz)

We compare both model’s AIC below.

Show the code
# Easiest, most direct way
AIC(aragmodel2); AIC(aragmodel3)

We can also create a fancier table using the library gtsummary like in practical 4. We use again exponentiate = T to exponentiate coefficients and style_number() to specify the number of digits. In addition, we use one extra argument: add_glance_table() allows to add key parameters at the bottom of the table. And table_merge() allows to take several tables built with tbl_regression and put them in one big table. This makes it easy to compare models.

Show the code
tbl_m2 <- aragmodel2 %>% 
  tbl_regression(exponentiate = TRUE,
                 estimate_fun = ~style_number(.x, digits = 4)) %>%
  add_glance_table(include = c(AIC, BIC, deviance))

tbl_m3 <- aragmodel3 %>% 
  tbl_regression(exponentiate = TRUE,
                 estimate_fun = ~style_number(.x, digits = 4)) %>%
  add_glance_table(include = c(AIC, BIC, deviance))

tbl_m2m3 <- 
  tbl_merge(
    tbls = list(tbl_m2, tbl_m3),
    tab_spanner = c("aragmodel2", "aragmodel3")
  )
tbl_m2m3
Characteristic
aragmodel2
aragmodel3
IRR 95% CI p-value IRR 95% CI p-value
index 1.0001 1.0001, 1.0002 <0.001 1.0001 1.0001, 1.0002 <0.001
sin52 1.0403 1.0311, 1.0495 <0.001 1.0621 1.0508, 1.0735 <0.001
cos52 1.0783 1.0651, 1.0917 <0.001 1.1721 1.1385, 1.2066 <0.001
winter





    FALSE

    TRUE 1.0673 1.0461, 1.0889 <0.001 1.2688 1.2020, 1.3392 <0.001
AIC 4,718

4,666

BIC 4,739

4,696

Deviance 1,006

950

tmax


1.0078 1.0054, 1.0103 <0.001
winter * tmax





    TRUE * tmax


0.9849 0.9806, 0.9893 <0.001
Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

Note that winter * tmax tells R to include terms in the model for winter, tmax and for their interaction. If we just wanted to include the interaction term without its corresponding “main effects”, i.e. without winter or tmax, we would use winter:tmax instead.

Discuss the output of the two models. Does the interpretation change when winter is taken into account as an interaction term along with temperature?

Task 9.2 (Optional)

It has been argued by experts that low temperatures in winter might be “slow killers”; that is, very low temperatures in winter do not result in a peak in mortality on the same day/week as they are observed, but rather after some time has elapsed. On the other hand, very high temperatures in summer are “fast killers”; they are associated with peaks in mortality very fast, i.e. heat waves kill people fast. Can you find evidence for this for Aragón?

Help for Task 9.2

It has been argued by experts that low temperatures in winter might be “slow killers”; that would mean that very low temperatures in winter do not result in a peak in mortality on the same day/week when they are observed, but rather after some time. On the other hand, very high temperatures in summer are fast killers; they are associated with peaks in mortality very fast, i.e. heat waves kill people fast.

For this reason, you decide to check whether temperature has an effect on mortality with some lag. stats::lag is the default base R function to compute lags from the stats package. In the example below, we prefer to use one of the tidyverse package alternatives,namely dplyr::lag. We specify the package name here to avoid possible conflicts with other packages which have a lag function.

We are using the lag function from the dplyr package. For a given week, the value of tmax is lagged by 1 in respect to the previous week’s value.

Show the code
# Create lag variable
aragonz <- aragonz %>% 
  mutate(L1.tmax = dplyr::lag(x = tmax, n = 1))

Create a new model named aragmodel4 including the lag variable, and compare it with the previous model aragmodel3. What can you conclude from it?

Show the code
aragmodel4 <- glm(cases ~ index + sin52 + cos52 + winter * tmax + L1.tmax,
                           family = "poisson",
                           data = aragonz)

summary(aragmodel4)

Call:
glm(formula = cases ~ index + sin52 + cos52 + winter * tmax + 
    L1.tmax, family = "poisson", data = aragonz)

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)      5.087e+00  3.198e-02 159.076  < 2e-16 ***
index            1.246e-04  2.097e-05   5.942 2.81e-09 ***
sin52            5.679e-02  6.343e-03   8.954  < 2e-16 ***
cos52            1.504e-01  1.663e-02   9.043  < 2e-16 ***
winterTRUE       2.268e-01  2.795e-02   8.113 4.94e-16 ***
tmax             8.194e-03  1.304e-03   6.283 3.32e-10 ***
L1.tmax         -1.226e-03  1.146e-03  -1.070    0.285    
winterTRUE:tmax -1.433e-02  2.278e-03  -6.291 3.15e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1708.45  on 518  degrees of freedom
Residual deviance:  943.72  on 511  degrees of freedom
  (1 observation deleted due to missingness)
AIC: 4654.1

Number of Fisher Scoring iterations: 4
Show the code
AIC(aragmodel3); AIC(aragmodel4)
[1] 4665.836
[1] 4654.111

Instead of running the model with an interaction term, you could run the analyses for winter and the rest of your dataset separately.

The interaction term was not significant, but you are still curious about the differential effect of temperatures on mortality in winter and the other seasons. You decide to split the data and run two different models for winter and non-winter mortality and temperatures, and compare them.

Think carefully about how you will compare them: are theAIC criteria still valid for this task?

Show the code
## winter subset
aragonz.winter <- 
    aragonz %>% 
    filter(winter == TRUE)

## not winter subset
aragonz.notwinter <- 
    aragonz %>% 
    filter(winter == FALSE)
Show the code
# Winter model
aragmodel5 <- glm(cases ~ index + sin52 + cos52 + L1.tmax,
                           family = "poisson",
                           data = aragonz.winter)

# Not winter model
aragmodel6 <- glm(cases ~ index + sin52 + cos52 + L1.tmax,
                           family = "poisson",
                           data = aragonz.notwinter)

# Table summaries
tbl_m5 <- aragmodel5 %>% 
  tbl_regression(exponentiate = TRUE,
                 estimate_fun = ~style_number(.x, digits = 4))

tbl_m6 <- aragmodel6 %>% 
  tbl_regression(exponentiate = TRUE,
                 estimate_fun = ~style_number(.x, digits = 4))

# Comparison table
tbl_m5m6 <- 
  tbl_merge(
    tbls = list(tbl_m5, tbl_m6),
    tab_spanner = c("Winter model", "Not winter model")
  )
tbl_m5m6
Characteristic
Winter model
Not winter model
IRR 95% CI p-value IRR 95% CI p-value
index 1.0002 1.0001, 1.0003 <0.001 1.0001 1.0001, 1.0002 <0.001
sin52 1.1944 1.1428, 1.2486 <0.001 1.0485 1.0348, 1.0624 <0.001
cos52 1.7229 1.4361, 2.0675 <0.001 1.1094 1.0785, 1.1412 <0.001
L1.tmax 0.9923 0.9879, 0.9967 <0.001 1.0031 1.0006, 1.0055 0.015
Abbreviations: CI = Confidence Interval, IRR = Incidence Rate Ratio

Discuss the output of the different models. Which one would you go for?